Semi-Automatic Error Analysis for Large-Scale Statistical Machine Translation Systems
نویسندگان
چکیده
This paper presents a general framework for semi-automatic error analysis in large-scale statistical machine translation (SMT) systems. The main objective is to relate characteristics of input documents (which can be either in text or audio form) to the system's overall translation performance and thus identify particularly problematic input characteristics (e.g. source, genre, dialect, etc.). Various measurements of these factors are extracted from the input, either automatically or by human annotation, and are related to translation performance scores by means of mutual information. We apply this analysis to a state-of-the-art large-scale SMT system operating on Chinese and Arabic text and audio documents, and demonstrate how the proposed error analysis can help identify system weaknesses.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملError assessment in man-machine systems using the CREAM method and human-in-the-loop fault tree analysis
Background and Objectives: Despite contribution to catastrophic accidents, human errors have been generally ignored in the design of human-machine (HM) systems and the determination of the level of automation (LOA). This paper aims to develop a method to estimate the level of automation in the early stage of the design phase considering both human and machine performance. Methods: A quantita...
متن کاملSemi-Automatic Parallel Corpora Extraction from Comparable News Corpora
The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologicall...
متن کاملStatistical Analysis on Large Scale Chinese Short Message Corpus and Automatic Short Message Error Correction
Analysis of short message corpus is an important foundation for research of automatic short message processing technology. Based on large scale short message corpus, this paper firstly presents statistical data and performs analysis in detail on basic information of short message corpus and special language phenomena in it. The distributions of the corpus parameters and special language phenome...
متن کاملHow can we measure machine translation quality?
In this opinion paper, we describe our research work on machine translation evaluation approaches that include mechanisms for human feedback and are designed to allow partial adaptation of the translation models which are being evaluated. While there exists a plethora of different automatic evaluation metrics for machine translation, their output in terms of scores, distances, etc. quite often ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007